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ABSTRACT 

This  report  describes  the  design  and  implementation  of  an 
automatic  integrator  for  systems  of  ordinary  differential  equations 
for  use  on  the  ILLIAC  IV  computer  system.  This  integrator  accepts  a 
simple  statement  of  the  differential  equations  and  their  initial  values 
and  produces  an  ILLIAC  IV  Assembly  Language  code  which  can  be  used  to 
solve  them.   The  intent  is  to  produce  a  code  which  will  be  efficient  on 
a  large  parallel  computer. 
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1.   INTRODUCTION 

Standard  methods  of  integrating  ordinary  differential  equations 
consist  essentially  of  incrementing  the  independent  variable  by  some 
small  amount  and  then,  by  using  past  values  and  derivatives  of  the  de- 
pendent variable,  predicting  the  new  value  of  that  dependent  variable. 
Thus  the  integration  proceeds  from  the  given  initial  value  of  the  de- 
pendent variable  to  some  user  specified  stopping  point,  building  on 
itself  as  it  goes.   The  same  method  is  applied  to  systems  of  ordinary 
differential  equations;  but,  of  course,  one  cannot  integrate  one  equa- 
tion completely  and  then  the  next,  since  in  general,  the  derivative  of 
each  dependent  variable  will  depend  on  the  other  dependent  variables 
of  the  system.   Each  equation  must  be  integrated  at  a  given  value  of 
the  independent  variable  before  going  to  the  next  step. 

For  this  reason,  to  perform  an  integration  of  a  system  on  a 
large  parallel  computer  like  ILLIAC  IV,  it  is  desirable  to  perform  the 
integration  on  all  of  the  equations  simultaneously.   Difficulties 
arise  when  the  equations  are  not  identical  in  form;  and,  of  course,  in 
general  they  are  not.   Integration  methods  consist  of  one  or  more  func- 
tion evaluations;  that  is,  substituting  the  present  value  of  the  depen- 
dent variable  into  the  equation  to  find  its  derivative  at  the  given 
point.   This  is  the  part  that  is  most  difficult  to  do  simultaneously 
with  dissimilar  equations,  and  it  is  the  part  which  usually  accounts 
for  much  of  the  computation  time,  even  on  a  serial  machine. 


The  following  example  will  illustrate  some  of  the  problems  to 
be  solved: 

yl  =  a  yi  y2  +  b  yi  +  C  yl  yl/  y2 
y2  =  d  y2  +  e  yl  y2  +  f  yi  y2 

The  logical  way  to  proceed  would  seem  to  be  to  try  to  minimize 
the  number  of  nonparallel  arithmetic  operations  which  must  be  done.   For 
an  example  of  how  this  might  be  done,  consider  a  two  processing  element 
machine  with  each  processor  working  on  a  single  equation.  Assume  that 
by  allocating  storage  properly  and  by  using  some  indexing  (to  be  explained 
later),  it  is  possible  to  access  constants  and  different  dependent  vari- 
ables simultaneously.   The  operation  sequence  would  then  be  as  follows 
(parentheses  indicate  memory  access,  I  indicates  processor  idle,  arith- 
metic operations  indicated  by  their  symbols). 

PI  (a)  xx  (b)  x  I  +  (c)  x  x  -  + 
P2  (d)  x  I  (e)  x  x  +  (f)  x  x  I  + 

If  the  terms  can  be  rearranged  so  that  the  first  term  of  the 
first  equation  and  the  second  term  of  the  second  equation  can  be  done 
at  the  same  time,  some  more  savings  can  be  gained. 

PI  (a)  x  x  (b)  x  +  (c)   x  x  -  + 
P2  (e)  x  x  (d)  x  +  (f)   x  x  i  + 


Thus  more  operations  are  done  in  parallel  and  the  processors 
are  idle  fewer  times.  With  larger  systems  of  equations  there  are  more 
terms  that  can  be  done  in  parallel  and  more  comparisons  to  be  made. 
With  rearrangement,  unmatched  terms  could  also  be  done  partly  in  parallel. 
With  the  larger  number  of  terms  in  big  systems,  it  is  reasonable  to 
expect  that  the  relative  savings  will  be  greater  than  those  in  this 
example. 

Another  problem  would  arise  if  one  of  the  equations  had  sig- 
nificantly more  terms  than  the  other.   Some  method  of  sharing  the  work 
between  processors  should  be  used.  As  the  system  of  equations  gets 
larger  in  number  of  terms  and  number  of  equations,  all  of  these  problems 
become  more  and  more  complex. 

The  translator-integrator  program  described  in  this  thesis 
tries  to  do  these  types  of  optimizations  as  well  as  compiling  the 
Assembly  code  to  evaluate  the  input  equations.   The  translator-integrator 
(which  will  be  referred  to  as  the  integrator  in  the  following  pages  - 
the  section  designated  should  be  clear  from  the  context)  has  associated 
with  it  some  skeleton  forms  of  ILLIAC  IV  Assembly  Language  programs  to 
solve  ordinary  differential  equations  by  different  numerical  methods. 
These,  along  with  the  compiled  code  and  generated  storage  pseudo-ops 
allows  the  program  to  produce  a  completed  Assembly  code  to  solve  the 
equations. 

The  translator-integrator  system  itself  is  an  Algol  program 
which  would  be  executed  on  the  Burroughs  65OO  in  the  ILLIAC  IV  system. 
A  schematic  of  the  integrator  is  shown  in  Figure  1. 


Section  2  describes  the  integrator  from  the  user's  point  of 
view.  Section  3  is  a  discussion  of  the  implementation  techniques  and 
numerical  methods  used.  Section  h  discusses  the  capabilities  of  this 
system  and  the  further  work  which  might  be  done. 


Differential  Equations 
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Figure  1.   Schematic  Diagram  of  Integrator  System 


2.  USE  OF  THE  INTEGRATOR  SYSTEM 

The  primary  object  of  any  automatic  integration  system  is 
to  provide  the  user  with  a  device  to  solve  his  equations  with  a  minimum 
of  effort  and  analysis.   From  the  user's  point  of  view,  the  integrator 
input  is  as  defined  in  Backus  Normal  Form  (BNF)  in  Figure  2.  Most  of 
this  should  be  self-explanatory.   Dependent  variables  are  expressed  as 
Y(N);  their  derivatives  as  Z(N),  and  the  independent  variable  as  T. 
Each  differential  equation  is  written  as 


z(n)  =  fn  (y(i),  y(2),  .  .  .  ,  t) 


The  functions  can  be  almost  any  Algol  expression  (an  obvious 
exception  is  an  if  clause)  as  defined  by  the  BNF  and  is  written  in  B65OO 
Algol  notation.   It  should  be  noted  that  constants  whose  values  can  be 
assigned  later  are  allowed  in  these  expressions.   The  initial  values  of 
the  dependent  variables  and  the  starting  and  final  values  of  the  inde- 
pendent variable  are  then  given.   These  are  real  numbers  expressed  in 
Algol  notation.  All  numbers  except  indexes  are  treated  as  6U-bit 
floating  point  numbers  by  the  integrator.   It  is  not  necessary  to  ter- 
minate whole  numbers  with  a  decimal  point  to  insure  that  they  are 
floating  point.   TO  and  TF  are  the  starting  and  ending  values  of  T,  the 
independent  variable. 

The  user  can  also  either  specify  a  stepsize  to  be  used  or 
specify  a  desired  maximum  error  which  the  integrator  will  try  to  stay 


<program>  :  :=  differential  equations>  <initial  values>  <constant  declarations> 

<other  statements>  END 
<differential  equations>  : :=  Z(  <integer>  )  =  <express  i  on>  ; 

<differential  equations>  Z(  <integer>  )=  <expression>  ; 
<initial  vaTues>  : :=  Z0(  <integer>  )  =  <real  number>  ; 

<initial  values>  Z0(  <integer>  )  =  <real  number>  ; 
<constant  declarations>  :  :=  <identifier>  =  <real  number>  ; 

<constant  declarations>  <identifier>  =  <real  number>  ; 
<other  statements>  : :=  <statement>  ;  |  <other  statements>  <statement>  ; 
<statement>  :  :=  TO  =  <real  number>  |   TF  =  <real  number>  |  STEP  =  <real  number> 

ERROR  =  <real  number> 
<expression>  :  :=  <term>  |  <add  operator>  <term>  |  <expression>  <add  operator> 

<terni> 
<add  operator>  : :=  +  <    - 

<term>  :  :=  <factor>   |  <term>  <multiply  operator>  <factor> 
<multiply  operator>  : :=  x  |  / 
<factor>  : :=  <primary>  |  <factor>  *   <primary> 
<primary>  : :=  <real  number>  |  <identifier>  |  Y(  <integer>  )     T 

<funid>  (  <expression>  )     (<expression>) 
<funid>  : :=   SIN   I    COS   I    EXP   I    LOG 


Figure  2.   Input  Language  Description  in  BNF 


within.  Also  any  constants  which  were  used  in  the  equations  should  he 
defined.   The  program  input  is  terminated  with  an  END. 

In  general  the  input  is  free  form  with  semicolons  separating 
statements.  In  fact,  a  missing  semicolon  will  not  always  cause  an  error, 
but  it  is  suggested  that  the  user  try  to  make  the  input  no  more  diffi- 
cult to  recognize  than  necessary.   Identifiers  are  limited  to  six 
characters;  longer  ones  are  truncated  on  the  right.   Numbers  are 
limited  to  eighteen  characters  including  sign  and  exponent.  All  blanks 
are  ignored. 

From  this  minimal  amount  of  information  an  Assembly  Language 
Code  will  be  produced  and  placed  as  a  file  on  disk.   If  some  information, 
such  as  an  initial  value  or  constant  declaration  is  omitted,  a  code 
will  still  be  produced  with  zeros  in  the  appropriate  locations.   In 
Figure  3,  a  sample  program  for  the  example  discussed  in  the  introduction 
is  shown  as  it  would  be  inputted  to  the  translator.  A  larger  example, 
complete  with  output,  is  given  in  the  Appendix. 

The  integrator  also  has  some  control  words  which  can  be  used 

if  desired.   These  are  entered  at  the  beginning  of  the  input.  The  string 

of  control  words  separated  from  each  other  by  at  least  one  space  must  be 

preceded  by  a  '#'  and  a  space.  The  control  words  are  as  follows: 

TEST  -  Do  not  generate  Assembly  Language  -Code.   This 
control  word  is  used  when  the  user  is  only 
interested  in  knowing  what  efficiency  can  be 
achieved  with  his  equations. 


#  RUNGE  ERRORCHECK     ; 
Z(l)=  AxY(l)xY(2)  +  BxY(l)   +  CxY(l)xY(l)/Y(2)   ; 
Z(2)=  DxY(2)+ExY(l)xY(2)  +  FxY(l)xY(2)   ; 
ZO(l)=  0  ; 
Z0(2)=  2.3  ; 
A  =  3.^56  ; 
B  =  0  ; 
C  =  1  ; 

D  =  3-^5677U836^5@-09  ; 
E  =  I+.9  ;   F  =   .33  ', 
TO  =  0.0  ;  TF  =  1.0  ; 
ERROR  =  .0001   ; 
END 


Figure  3-  Sample  Input 
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RUNGE  -  Use  a  Fourth-Order  Runge  Kutta  method  to 
integrate  the  equations.   This  is  the 
default  option. 

CORPRE  -  Use  a  Corrector-Predictor  method  to  integrate 
the  equations.  This  is  a  more  complicated 
method  but  could  be  used  on  more  difficult 
equations. 

ERRORCHECK  -   Estimate  truncation  error  and  adjust 
stepsize  accordingly.  This  control  word 
should  be  used  when  an  error  bound  is  to  be 
put  on  the  results  as  previously  described. 

CONSTMAP  -  This  prints  out  a  chart  showing  the  location 
of  constants  in  PE  memory,  so  that  they  can  be 
changed  without  reexecuting  the  integrator. 

FUNMODE  -   If  the  terms  in  the  differential  equations 
are  predominantly  made  up  of  parenthesized 
expressions  or  function  calls  (i.e.,  SIN, 
COS,  etc.)  or  both,  then  this  option  should 
be  turned  on  to  improve  the  efficiency  of 
the  code  generated. 

PATTERN  -  The  integrator  makes  some  of  its  decisions 
based  on  the  supposition  that  the  terms  in 
the  first  few  equations  are  representative 
types  for  the  whole  system.   If  this  is  not 
so  for  a  particular  system,  the  efficiency 
may  be  improved  by  using  this  control  word. 
When  this  option  is  used,  the  first  equation 
should  contain  terms  typical  of  the  system  and 
in  about  the  same  ratio  to  the  other  terms  in 
the  equation  as  they  occur  in  the  system.   No 
code  will  be  generated  for  this  equation,  but 
it  will  aid  the  integrator  in  allocating  the 
work  to  be  done. 

The  Assembly  Language  Program  which  is  produced  also  has 

some  comments  in  it  so  that  with  the  information  provided  by  the 

CONSTMAP  option  of  the  integrator,  the  user  should  be  able  to  make  small 

changes,  such  as  the  values  of  constants  or  initial  values,  without 

rerunning  the  program  if  desired.  An  alternative  to  this  would  be  to 
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read  in  data  at  execution  time.  When  the  input  conventions  for  ILLIAC 
IV  are  defined,  the  integrator  can  easily  be  modified  to  load  the  data 
into  the  correct  place. 
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3 •   IMPLEMENTATION 
3.1  An  Overview  of  the  System 

3.1.1  The  Recognizer 

The  recognizing  part  of  the  integrator  program  was  written 
using  the  ILLIAC  IV  Translator  Writing  System  (TWS).  The  input  to  one 
of  the  TWS  programs  is  a  Backus  Normal  Form  (BNF)  description  of  the  in- 
tegrator input  language  syntax.  This  program,  which  runs  on  the  Burroughs 
5500,  generates  an  Algol  scanner  and  parser  which  can  be  used  to  recog- 
nize programs  written  in  the  language  defined  by  the  input  BNF.  An 
action  number  can  be  associated  with  each  production  to  execute  a 
designated  semantic  routine.  A  TWS  program  is  also  provided  to  con- 
catenate these  semantic  routines  to  the  parser.  While  this  program 
also  provides  several  built-in  routines  designed  to  help  write  compilers, 
few  of  these  were  actually  used  due  to  the  specialized  nature  of  the 
integrator  language.  All  of  the  analysis,  reduction  and  code  generation 
for  the  integrator  is  done  within  the  semantic  routines. 

The  actual  input  to  the  TWS  programs  is  somewhat  different 
than  that  given  in  Figure  2.  Action  numbers  are  required,  and  the  BNF 
description  is  slightly  different  to  aid  the  syntactic  analysis. 

3.1.2  The  Semantic  Routines 

Before  the  parsing  of  the  input  begins,  a  short,  fast  prepass 
is  made  in  the  initial  semantic  routine  to  count  the  number  of  equations 
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and  number  of  terms  (by  counting  arithmetic  operators).   Since  the 
input  to  the  integrator  may  be  quite  large,  it  is  necessary  to  do 
optimization  and  the  related  compilation  and  storage  allocation  as  the 
program  proceeds,  so  as  to  limit  the  amount  of  B65OO  storage  needed. 
The  information  gleaned  from  this  prepass  is  needed  to  do  this  dynamic 
analysis. 

As  the  input  differential  equations  are  parsed,  an  inter- 
mediate code  for  evaluating  them  is  generated.   The  optimization  is 
then  performed  on  a  term  by  term  basis  on  this  intermediate  code. 
Storage  assignment  begins  at  the  first  processing  element  and  then 
proceeds  according  to  the  types  of  terms  and  equations  involved  in  the 
problem.   Other  types  of  statements  such  as  initial  values  and  constant 
declarations  are  recognized,  and  the  information  is  saved.  When  the 
entire  input  has  been  recognized,  all  the  necessary  information  for 
Assembly  code  generation  has  been  formed.  At  this  point  the  appropriate 
skeleton  ILLIAC  IV  integration  program  is  selected,  and  the  storage 
declarations  and  the  code  for  evaluating  the  equations  are  inserted  in 
the  proper  places  to  form  a  completed  Assembly  Language  Code.  This  is 
left  as  a  disk  file  which  can  be  assembled  to  machine  code. 

3.2  Methods  for  Improving  Parallelism 

3.2.1  Possible  Instructions  and  Operands 

The  instructions  used  to  evaluate  an  equation  are  a  small  and 
standard  set;  for  example,  load  register,  add  to  register,  multiply, 
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store,  etc.   Thus  the  operands  that  will  be  used  also  constitute  a 
fairly  limited  set.   In  general,  they  can  be  divided  into  the  following 
classifications : 

1.  CONSTANTS  -  These  values  do  not  change  during 

the  integration.   They  may  have  been 
declared  either  as  literals  or  by 
constant  declaration  statements. 

2.  DEPENDENT  VARIABLE  -  These  have  an  index  value 

associated  with  them  declared  in 
input  language  as  Y  (<index>). 

3-   FUNCTION  CALLS  -  These  include  any  of  the 

explicitly  allowed  functions  such 
as  SIN  and  COS. 

k.      INDEPENDENT  VARIABLE  -  This  is  designated  by  T 
in  the  input  language . 

5-  REGISTER  NAME  -  These  occur  in  register  to 
register  instructions  used  for 
temporary  storage  of  data. 

6.  NO  OPERAND  -  This  is  for  such  instructions  as 
change  the  sign  of  the  A  register 
(CHSA). 


3.2.2  Doing  Similar  Instructions  in  Parallel 

First  consider  the  same  instruction  operating  on  different 
operands  which  fall  into  the  same  operand  class.  Methods  must  be 
developed  for  doing  these  in  parallel. 

When  the  operand  is  a  constant,  as  many  consecutive  rows  of 
PE  memory  as  needed  are  blocked  off.  When  the  Assembly  code  is  gen- 
erated, instructions  which  access  constants  are  given  operand  fields 
such  that  they  point  to  consecutive  rows  in  this  array.   Each  processor 
can  have  a  different  value  in  its  memory  position  in  the  constant  row. 
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It  is  a  simple  matter  for  the  integrator  to  take  the  values  defined  as 
literals  or  by  constant  declarations  and  insert  them  in  the  correct  row 
and  column  so  they  are  accessed  by  the  correct  instruction. 

The  values  of  the  dependent  variables  are  not  stored  in  the 
same  way  as  constants  since  they  must  be  updated  constantly  throughout 
the  execution  of  the  integration  to  be  performed.   The  values  of  the 
independent  variables  that  are  needed  are  kept  in  each  PE  in  a  column. 
The  way  to  access  different  Y's  in  parallel  is  to  associate  a  tag  row 
with  each  access  in  a  way  similar  to  that  used  for  constants.   In  this 
case  the  tag  row  is  loaded  into  the  X  register  when  needed  and  a  load 
modified  by  index  is  made  from  the  independent  variable  storage  area. 
While  this  costs  an  instruction  to  load  the  X  register,  it  really  only 
costs  /m,  where  m  is  the  number  of  equations,  since  the  indexes  for  all 
of  the  variables  are  loaded  at  once.  Also  in  a  serial  machine  these 
variables  would  often  be  declared  as  an  indexed  array. 

At  the  present  stage  of  design,  it  is  not  possible  to  execute 
different  function  calls  in  parallel,  since  a  function  call  is  in  reality 
a  reasonably  large  set  of  instructions  to  be  inserted  in  this  position, 
and  the  processing  elements  must  execute  the  same  instructions.  Since 
most  functions  form  some  sort  of  polynomial  approximation,  it  would  be 
a  reasonable  extension  of  the  system  to  have  the  integrator  provide  its 
own  library  functions  which  have  been  written  so  they  are  as  parallel 
as  possible  and  thus  gain  some  speed. 

Since  there  is  only  one  independent  variable,  instructions 
with  it  as  an  operand  can  always  be  performed  in  parallel.  The  value  is 
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kept  in  all  PE's  in  the  same  memory  row  and  updated  when  appropriate. 
No  operand  instructions  can  clearly  always  be  done  in  parallel. 

Instructions  with  different  register  names  as  operands 
clearly  cannot  be  done  in  parallel.   Since  these  only  occur  in  tempo- 
rary storage  manipulation  according  to  a  well-defined  algorithm, 
similar  terms  will  probably  access  the  same  registers  anyway.  Also, 
due  to  the  method  used  for  sharing  terms  described  in  a  later  section, 
little  temporary  storage  will  be  used. 

3.2.3  Term  by  Term  Analysis 

In  order  to  compare  equations  to  see  what  can  be  done  in 
parallel,  it  is  necessary  to  break  them  down  in  to  more  convenient 
and  hopefully  more  similar  parts.   The  unit  selected  was  that  of  a  term 
since  these  can  be  shared  to  other  processors  and  then  recombined  with 
just  an  addition  operation.   The  expressions  representing  the  differ- 
ential equations  are  only  broken  down  into  the  first  level  of  terms; 
that  is,  terms  inside  parentheses  are  not  separated.   Some  examples  are 
listed  below.   The  expressions  on  the  left  are  broken  down  into  the 
terms  (separated  by  commas)  on  the  right.   The  expression  can  be  re- 
formed by  just  adding  the  terms  on  the  right. 


A  +  B  +  C   ►    A,  B,   C 

A  +  B  -  C   — ►    A,  B,  -C 

A  +  (B-C)    ►    A,  (B-C) 

AxY(l)  +  BxY(2)  x  Y(3)  +  SIN  (Y(U)) 


AxY(l),  BxY(2)  x  Y(3),  SIN  (Y(k)) 
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Of  course,  one  can  certainly  write  separate  terms  according 
to  this  rule  which  differ  tremendously,  but  it  is  reasonable  to  expect 
that  in  a  large  system  of  equations  many  terms  will  be  similar.  Further 
discussion  of  the  integrator  should  make  obvious  the  convenience  of 
this  division.  Thus  if  all  of  the  instructions  in  two  terms  are  the 
same  and  the  operands  fall  into  the  same  classes  as  defined  by  previous 
remarks,  then  these  terms  can  be  done  in  separate  PE's  simultaneously. 

3-2.4  Inserting  Dummy  Instructions 

In  actuality  it  cannot  be  assumed  that  all  terms  will  be  per- 
fect matches  with  other  terms.  Consider  the  following  terms: 

Ax  Y  (1)  x  Y  (3) 
Bx  Y  (3) 

They  differ  only  in  one  multiply;  and  if  no  other  better 
pairing  is  available,  this  can  be  treated  as  Bx  Y  (3)  x  1.   In  actuality 
it  is  treated  as  Bx  Y  (3)  x  Y  (0)  where  Y  (0)  always  contains  a  one. 
Thus  these  terms  can  be  paired.   Other  dummy  instructions  can  be  in- 
serted as  follows : 

Y  (1)  x  B   "\     Y  (1)  x  B 

Y  (2)        /"    Y  (2)  x  1 

SIN  (A  +Y(1)^     SIN  (A  +  Y(l)) 
SIN  (B)      /     SIN  (B  +  0) 


(C  +  0) 


Inserting  dummy  operands  for  instructions  involving  other 
than  constants  and  dependent  variables  does  not  appear  worthwhile  since 
extra  instructions  would  be  needed.   Since  a  function  call  usually  in- 
volves several  instructions,  it  is  not  advisable  to  have  a  dummy  call 
unless  no  better  match  can  be  found. 

3.2.5  Sharing  Terms 

As  previously  mentioned,  one  of  the  important  problems  is  that 
the  equations  may  vary  greatly  in  the  number  of  terms  in  each.  Some 
method  is  needed  to  share  the  work  more  or  less  equally  between  pro- 
cessors. This  is  the  place  where  the  information  gained  in  the  prepass 
becomes  necessary.  A  limit  is  set  on  the  number  of  terms  to  be  evaluated 
in  each  PE  according  to  the  following  equation: 


maximum  number       I  total  number  of  terms 


E 


of  terms  /  PE       total  number  of  PE's 


The  square  brackets  indicate  the  smallest  integer  greater  than 
the  enclosed  value.   This  limit  serves  to  distribute  the  terms  equally 
across  the  PE's.  When  the  optimization  has  assigned  the  maximum  number 
of  terms  to  a  PE,  it  then  skips  to  the  next.  This  requires  that  a  tag 
be  associated  with  each  term  indicating  which  equation  it  belongs  to. 
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After  a  term  has  been  evaluated,  it  is  added  to  a  scratch  memory  loca- 
tion in  the  same  PE.   This  location  depends  on  the  tag  so  that  at  the 
end  of  the  evaluation,  there  may  be  several  separate  partial  sums  in  a 
particular  PE.  After  all  partial  sums  are  formed,  routes  and  adds  are 
performed  which  combine  partial  sums  and  return  them  to  the  correct  PE's. 
Also,  if  the  number  of  PE's  is  much  greater  than  (i.e.,  more  than  twice) 
the  number  of  equations,  the  integration  procedures  other  than  function 
evaluation  are  assigned  to  every  other  one  or  two  PE's  sequentially  so 
that  the  amount  of  routing  necessary  will  be  reduced.  Assigning  the 
PE's  sequentially  assures  that  the  routes  will  be  small  since  the  terms 
are  also  assigned  to  the  PE's  more  or  less  sequentially  within  the  range 
of  the  optimization  routine. 

3-3  The  Optimization  Routine 

The  optimization  routine  is  the  procedure  which  is  called 
when  a  term  has  been  recognized.   Its  purpose  is  to  determine  what  term 
this  just  recognized  term  can  be  done  in  parallel  with.   Flowcharts  are 
given  in  Figures  k-6.      There  are  three  basic  stacks.   COMPSTK  is  the 
stack  containing  the  intermediate  code  for  the  term  just  recognized. 
CODESTK  contains  the  intermediate  code  for  the  terms  which  are  to  be 
shared  over  the  processors  as  previously  discussed.   EXCODESTK  contains 
the  code  for  the  terms  which  cannot  be  matched  with  those  in  CODESTK. 
In  the  normal  state  of  operation,  terms  which  contain  function  calls  and 
parenthesized  expressions  are  put  in  this  stack  since  they  are  expected 
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to  be  the  exception  and  not  the  rule.  This  can  be  changed  by  the  user 
if  desired  with  the  control  word  FUNMODE.   Optimization  does  occur  with- 
in EXCODESTK  as  well  as  CODESTK. 

Basically  the  optimization  routine  executes  a  series  of  calls 
to  a  comparison  routine  with  different  arguments.  The  arguments  tell 
this  comparison  routine  which  stacks  to  compare  and  how  many  dummy  in- 
structions it  can  insert  in  each  in  order  to  make  terms  match.  The 
flowcharts  show  the  sequence  of  these  calls.   The  purpose  of  the  partic- 
ular order  is  to  find  the  match  which  requires  the  least  overhead  first. 
When  the  maximum  number  of  terms  allowed  in  CODESTK  is  reached,  the  data 
for  storage  generation  is  extracted  for  that  PE,  and  the  program  proceeds 
to  the  next.   CODESTK  and  EXCODESTK  are  overwritten  for  each  PE  since 
the  basic  instruction  and  operand  will  be  the  same  if  a  comparison  is 
achieved.   Only  the  storage  information  contained  in  the  intermediate 
code  is  different.   The  pattern  for  the  intermediate  code  given  below 
illustrates  this. 


bit  12     18 


^3 


A 

B 

C 

6  bits    2k   bits 


6  bits 


A  -  code  number  indicating  the  instruction 
(LDA,  ADRN,  .  .  .  ) 

B  -  storage  information 

C  -  type  of  operand 
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If  C  is 


->  then  B  contains 


No  operand 
Constant 

Dependent  Variable 
Independent  Variable 
Function  Call 
Register  to  Register 


Arithmetic  computation 
temporary  storage  - 


>  0 

-►  symbol  table  pointer 


■^   index  value 


*  0 


"^     Name  of  function 


"^   Source  register  i 


name 


+■     index  within  storage 


Exponentiation  temporary 

storage  ►  index  within  storage 


The  code  remaining  when  the  final  equation  has  been 
optimized  is  used  to  form  the  ILLIAC  IV  code  itself. 

An  example  of  the  previously  discussed  case  follows: 


EQUATIONS: 


Z(l)  =  AxY(l)  x  Y(2)  +■  BxY(l)  +  CxY(l)  x  Y(2)  /  Y(2) 
Z(2)  =  DxY(2)  +  ExY(2)  x  Y(2)  +  FxY(l)  x  Y(2) 


DETERMINED  IN  PASS  1: 

MAXIMUM  TERMS  PER  PE     =3 
NUMBER  OF  EQUATIONS      =  2 
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NOTE:   ASSUMING  A  2  PE  SYSTEM 
PASS  2: 


1.   FIRST  THREE  TERMS  ARE  READ  IN  TO  FILL  UP  CODESTK. 
STORAGE  ALLOCATION  FOR  THE  FIRST  PE  IS  DONE. 


TERM  #3 


*~x~*  *** 


A  A  A  A  A  A 


CODESTK 


TERM  #2 


TERM  #1 


2.   READ  NEXT  TERM  IN  INPUT  AND  PUT  IN  COMPSTK. 

x  Y(2) 
D 


**-*-*-** 


COMPSTK 


3.   COMPARE  COMPSTK  TO  CODESTK  WITH  NO  DUMMY  INSERTIONS 
ALLOWED o 


k.     FIND  A  MATCH  WITH  TERM  2. 
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5.   COMPSTK  IS  OVERWRITTEN  INTO  CODESTK.   FLAGS  ARE 
SET  INDICATING  THAT  THIS  TERM  IS  TAKEN. 


6.   NEXT  TERM  IS  READ  INTO  COMPSTK. 

x  Y(2) 

x  Y(l) 

E 

xxxxxx 


COMPSTK 


COMPARE  COMPSTK  TO  CODESTK  WITH  NO  DUMMY  INSERTIONS 
ALLOWED. 


8.   FIND  A  MATCH  WITH  TERM  #1. 


9.   OVERWRITE  AS  IN  STEP  5. 


10.   READ  IN  NEXT  TERM. 

x  Y(2) 
x  Y(l) 

F 


XXXX-tt-X- 


COMPSTK 


11.   COMPARE  WITH  REMAINING  TERM  IN  CODESTK  -  NO  INSERTIONS. 


12.   FAILURE  -  TRY  TO  COMPARE  ALLOWING  ONE  INSERTION 


13.   SUCCESS 


2k 


I  Y(0) 
x  Y(2) 
x  Y(l) 


Note:  Y(0)  always  contains 
a  1. 


V   V    V    V    V    d 
JTWIT  A    K  A 


COMPSTK 


1^.      OVERWRITE  AS  IN  STEP  5. 

/  Y(0) 

x  Y(2) 

x  Y(l) 

F 

x  Y(2) 

D 

%*  \*  %»  \»  %+  \t 
™  A  A   A  A   A 

x  Y(2) 
x  Y(l) 

E 

V/    \I     \S    \f    \f     1/ 

A  A    A  A  TV  A 


CODESTK 


15.   USE  CODESTK  TO  ALLOCATE  STORAGE  FOR  THIS  PE. 


25 


ENTRY 


NO 


DOES  COMPSTK 
CONTAIN  A  FUNCTION 
CALL  OR 
PARENTHESIZED 
EXPRESSION 


COMPARE 
COMPSTK  AND 
EXCODESTK 
SUCCESSFUL? 


COMPARE 

COMPSTK 

AND 

CODESTK 

SUCCESSFUL 

IN  THIS 

PE?      A 


NO 


SUCCESSFUL 
IN  NEXT 
PE? 


YES 


NO 


COUNT  AS 

ONE 

FAILURE 


YES 


YES 


SAVE  COMPSTK 
UNTIL  NEXT 
PE 


OVERWRITE 
COMPSTK  INTO 
CODESTK 


SUCCESSES   + 
FAILURES  > 
MAXIMUM   # 
TERMS  PER 
PE? 


NO 


NO 


ADD 

COMPSTK 
TO  END 
OF 
EXCODESTK 


EXIT 


OVERWRITE 
COMPSTK  INTO 
EXCODESTK 


-►EXIT 


■►EXIT 


YES 


SAVE  STORAGE 

DATA  FOR  CODESTK 

AND  EXCODESTK. 

INCREMENT  PE 

COUNTER. 

INSERT  ANY  CODE 

IN  LOOK  AHEAD  FROM  LAST 


PE. 


Figure  k.      Optimization  Routine 
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NO 


L — ► 


NO 


I  NO 
J ► 


IS  THERE 
A  MATCH 
WITH  NO 
INSERTIONS 


JffiS_ 


NO 


MATCH  WITH 
1  DUMMY 
INSERTION 
INTO  COMPSTK? 


2, 


N-l? 


N 


JES_ 


_YES_ 


J£ES_ 


NO 


IS  EQUATION  COUNT 
LESS  THAN  1/6  OF 
TOTAL  NUMBER. 


ML 


YES 


MATCH  WITH 

1  DUMMY  INSERTION 

INTO  CODESTK? 

2,    .    .    .,   m-1 

M 


YES 


YES 


vps 


JSQ_ 


_^exit  i 


-►EXIT  1 


>EXIT  1 
_^EXIT  1 


EXIT  2 


-►  EXIT  1 


+.EXIT  1 


^.EXIT  1 


-►  EXIT  2 


Figure  5„   Comparing  Compstk  and  Codestk  BLOCK  A  in  PREVIOUS  FLOWCHART 
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ENTRY, 


DOES  THE  INSTRUCTION 
HAVE  AN  INDEPENDENT 
VARIABLE  OR  A 
CONSTANT  AS  AN 
OPERAND? 


NO 


YES 


COMPARE  ONLY 
BITS  12-17  AND 
U2-l*7  WITH 
STACK.   SAME? 


NO 


SET  COUNTER  5  TO 
TEST  SAME 
INSTRUCTION  IN 
OTHER  STACK 
WITH  NEXT  IN 
COMPSTK 


COMPARE  ALL 
BITS  WITH 

YES 

STACK.   SAME? 

NO 
►  FAILURE 

TEST  NEXT 
INSTRUCTION 


_IES_ 


M 


0? 


YES 


[no 


FAILURE 


M  >  0? 


YES 


NO 


i 

INSERT  DUMMY 
INTO  OTHER 
STACK, 
m  =  m  +-  1 

INSERT  DUMMY 
INTO  COMPSTK 
m  =  m  -  1 


Figure  6.   Routine  to  Compare  COMPSTK  with  Another  Stack.  M  is  the  num- 
ber of  insertions  of  dummy  instructions' allowed. 
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3.U  Methods  of  Integration 

At  present  there  are  two  integration  methods  available  for 
the  integrator.  These  are  Fourth  Order  Runge  Kutta  and  a  corrector- 
predictor  method  investigated  by  Nordsieck. 

The  standard  Fourth  Order  Runge  Kutta  method  is  used.  This 
is  as  in  the  equations  below. 


K 


!  =  F  (Yn,  T) 


K2  =  F  (Yn  +.j  HK1,  T  +  H/2) 
K3  =  F  (Yn  +  |riK2,  T  +  H/2) 
K^  =  F  (Yn  +  HK3,  T  +  H) 

Yn+l  =  Yn4H^      ta2.+  2S*Sl) 


If  automatic  stepsize  adjustment  is  called  for,  spare  PE's, 
which  are  not  used  for  the  integration,  are  used  to  do  the  integration 
at  twice  the  stepsize.  The  truncation  error  can  then  be  estimated  from 
the  following  formula. 


E  =  Kh5  = 


^ry(h/2-y  (h)1 

31  I  ym   *  ym 


Stepsizes  are  always  changed  by  a  factor  of  two  and  certain 
limitations  are  put  on  the  changing  frequency  and  minimum- size  of  the 
step. 
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The  corrector-predictor  method  is  a  multistep  method  and 

thus  requires  another  method  to  initialize  it.  The  midpoint  rule  is 

used  in  the  integrator.  When  the  starting  values  y  _  and  y  _  are 

''n-l     n-2 

found,  they  are  premultiplied  by  a  matrix  as  follows: 


i 

0          0          0 

"yn 

0 

10          0 

hf   (yn) 

a     = 
-n 

0 

Vk     -1      Vk 

hf  K- 

0 

v6  -v3  v6 

hf  <v 

It  can  be  shown  that  the  resulting  a  is  equivalent  to 


[■ 


V  hyn'.h2ynT/2>  ^n"^ 


] 


The  formulas  for  the  integration  method  are  then: 


-n  +  1  = 


1    1 

0    1 
0    0 


0 


2    3 
1    3 


0    0 


1 


a 

-n  + 


-3/8 

-1 

-3/1+ 

-v6 


\ 


yn'  "      yn).  There  are  several  reasons  for  using  this 
method.  For  reasons  which  are  beyond  the  scope  of  this  explanation, 
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the  method  minimizes  roundoff  error.  Also,  due  to  the  Pascal  triangle 
form  of  the  matrix,  the  matrix  multiplication  can  be  reduced  to  a  series 
of  a  few  additions  thus  speeding  up  the  method.  The  biggest  advantage 
is  that  changing  the  stepsize  does  not  require  restarting  procedures. 
The  vector  a  simply  is  multiplied  by 


1 

0 

0 

0 

0 

a 

0 
2 

a 

0 

where  a 

old 

0 

0 

0 

h 
new 

9 

0 

0 

"I 

Each  of  these  methods  is  stored  as  a  skeleton  ILLIAC  IV  pro- 
gram which  is  used  to  form  the  complete  integrator  output. 
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k.      EVALUATION  AM)  SUGGESTIONS  FOR  FURTHER 
DEVELOPMENT  OF  THE  INTEGRATOR 


As  might  well  be  expected,  the  efficiency  of  the  ILLIAC  IV 
program  generated  by  the  integrator  system  varies  greatly  on  the  type 
of  input.  One  would  expect  it  would  not  do  as  well  on  equations  with 
highly  complicated  and  diverse  terms,  and  this  proved  to  be  true. 
Equations  such  as  those  involved  in  chemical  kinetics  problems  where 
the  terms  differ  only  in  two  or  three  multiplys  were  handled  very  well, 
approaching  the  maximum  efficiency  for  this  algorithm. 

The  integrator  is  also  not  well  suited  to  small  systems  of 
equations.  There  are  many  reasons  for  this.   Conventional  integration 
methods  are  rigidly  sequential,  so  the  work  other  than  function 
evaluating  can  only  use  as  many  PE's  as  there  are  equations.  Attempts 
to  separate  methods  into  usable  parallel  forms  have  been  unsuccessful 
to  date.  This  method  thus  leaves  many  of  the  PE's  unoccupied  when  doing 
the  method  calculations  for  small  systems.  Also,  with  smaller  systems, 
the  ratio  of  these  calculations  to  the  function  evaluation  is  high  and 
so  the  overall  integration  becomes  inefficient.   One  way  of  improving 
this  might  be  to  use  some  extrapolation  method,  such  as  Richardson's. 
A  lower  order,  and  hence  faster  method,  could  be  used  to  do  the  inte- 
gration, thus  decreasing  the  integration-to-function  evaluation  ratio. 
Since  this  involves  doing  the  integration  at  different  stepsizes,  more 
of  the  PE's  could  be  performing  useful  work  in  parallel.  However,  due 
to  the  sequential  nature  of  the  methods,  there  would  still  be  a  lot 


of  PE's  empty.  For  example,  consider  a  three  step  extrapolation  me- 
halving  the  stepsize  for  each  integration.  The  PE  versus  time  graph 
would  look  as  follows: 


PE 


time 


h 

h/2 

h\ 

▼ 

This  represents  the  time  involved  in  all  the  integration  and  function 
evaluation  except  for  the  evaluation  of  the  extrapolation  formula. 
In  this  case  40  per  cent  of  the  machine  time  available  is  being  used. 
In  the  best  case  with  only  two  stepsizes,  25  per  cent  of  the  machine  time 
is  unused,  and  the  savings  gained  by  an  extrapolation  from  two  cases 
would  probably  be  negligible  anyway.   One  is  tempted  to  move  part  of  the 
process  time  from  the  smaller  stepsize  to  the  empty  spaces  in  the  larger, 
but  this  cannot  be  done  due  to  the  aforementioned  sequential  nature  of 
the  methods.  The  starting  values  to  begin  the  integration  in  the  middle 
are  not  available  until  the  integration  has  been  performed  up  to  that 
point.  For  these  reasons  extrapolation  methods  seem  of  doubtful  value. 
There  are  other  problems  which  affect  small  sets  of  equations.   Since 
there  are  few  terms,  it  can  be  expected  that,  on  the  average,  there  will 
only  be  a  few  that  can  be  done  in  parallel.  Thus  the  function  evaluation 
may  be  inefficient  as  well. 


33 


With  larger  systems  the  results  are  significantly  better. 
There  are  fewer  wasted  PE's  during  the  integration  operations.  There 
are  more  terms  to  he  compared  and  to  fill  up  the  machine.  It  cannot, 
however,  be  guaranteed  that  100  per  cent  efficiency  will  be  obtained  even 
if  the  maximum  number  of  terms  can  be  done  in  parallel.  This  is  due  to 
the  fact  that  if  the  number  of  terms  is  not  an  even  multiple  of  6h, 
another  term  evaluation  must  be  performed  even  if  there  is  only  one 
extra  term.  The  following  chart  illustrates  the  maximum  efficiency  for 
different  numbers  of  terms. 


Maximum  terms 
per  PE 

Worst 
#terms 

case 

efficiency 

Average 

#terms 

efficiency 

1 

1 

15$ 

32 

50$ 

2 

65 

51$ 

96 

75$ 

h 

193 

76$ 

22^ 

87$ 

8 

kk9 

m 

i+8o 

9^$ 

20 

1217 

95$ 

12i+8 

98$ 

This  clearly  indicates  some  of  the  improvement  with  larger 
systems.  Breaking  up  large  terms  into  smaller  ones  might  improve  this 
although  there  will  be  some  added  overhead  in  keeping  track  of  the  extra 
terms  and  recombining  to  obtain  the  results. 

Another  item  which  causes  inefficiency  is  the  routing  which 
must  be  done  to  recombine  the  partial  sums  after  their  evaluation.  If 
the  equations  are  roughly  all  the  same  length,  the  routing  will  be 
minimal.  Strange  distributions  of  long  and  short  equations  might  greatly 


increase  the  routing.  An  example  might  be  a  set  where  the  first  thirty- 
equations  have  two  terms  each  and  the  last  thirty  have  twenty  terms  each. 
If  these  same  equations  were  alternated  -  one  short,  one  long,  one  short  - 
the  routing  would  he  reasonable.  It  would  be  desirable  to  have  the  in- 
tegrator system  do  this  type  of  exchange  before  analyzing  the  equations. 
Of  course,  an  intelligent  user  could  do  the  same  thing. 

Another  factor  is  the  depth  of  analysis  used  to  find  terms 
that  can  be  done  in  parallel.  Highly  complex  terms  are  difficult  to 
handle.   Consider  the  following: 

Bx  (Ax  SIN  (Y(l))  /  2  +  Y  (3)  *  3) 

(Dx  Y  (2)  -  SIN  (Y  (6))  x  Cx  Y  (2))  *  6 


It  is  clear  that  many  of  these  terms  could  be  done  in  parallel, 
but  highly  sophisticated  techniques  would  be  needed  to  do  this  auto- 
matically. To  do  this  high  a  level  analysis  on  every  term  might  take 
enormous  amounts  of  computer  time  and  might  only  be  worthwhile  on  smaller 
systems. 

As  previously  mentioned,  another  useful  addition  would  be  to 
have  routines  written  so  that  a  sine  and  a  tangent,  for  example,  could 
be  done  simultaneously.  This  would  seem  to  be  possible  since  the  com- 
mon methods  for  obtaining  these  functions  are  by  using  some  approximating 
polynomial.  It  might  be  possible  to  use  the  same  form  polynomial  with 
different  constants  to  get  different  functions.  It  should  be  noted  that 
sine  and  cosine  are  trivial  to  do  in  parallel  since  cos  x  =  sin  (x  +  ir/2). 
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It  goes  without  saying  that  an  integration  method  which  is 
less  sequential  would  he  valuable  in  improving  this  system. 

In  summary,  the  integrator  system  works  reasonably  well  on 
large  systems  of  equations  although  there  is  still  some  improvement 
possible.  For  small  systems  it  appears  that  a  different  algorithm  would 
need  to  be  used  to  get  really  good  results. 


APPENDIX  A 

SAMPLE  INPUT  PROBLEM  FOR  THE 
INTEGRATOR -TRANSLATOR  SYSTEM  (OUTPUT  LISTING) 


37 


II L I  AC     IV    THANSLATO*    W»ITI    fi    *Y*UM 

i.nnrs   co«pu.fk  •  vcastnn   i      n  »*or,MM  f, 

AK91L  30»  1969  *T  2013  t  «i« 


/does 


»LTST  HSvil) 

♦  P»TTr«M  t 
7(05  ■  »»v(0)"Yf rt)»P»Vf 0)»Y(0)*C"V(rt)»r(0)  I 

7<1)«KlKY(A)*A1xY(l)»AA,iY(3)xYM01  +  »2xY(1)*»3xYr?>*/U«Y(7)*»S>iYC,S)*«>00J 
7(2>«B3xY(i)xY(<n*:i?*Y<2)*'<3«Y(->)*H4xY(9)»Y(9)+e>SxYC2)»Y(V)*K9*Y(5)J 
7C3)»ClxY(3)>«Y(QWC?«Yr3)xY(9)+C3xYC3)+K3xY(«)xY(«n«K4xY(n*Kft*Yf8)xV(fn 

♦  K3*Y(8)xYf3)'A1x*(29)*K5txY(1)xY(2)*K6xYC?w*?xYC3)*Y(«>); 

?(«)>UlKVi3)«|)?«V(«|)«Y(9)*0)icY(i)HV(V)'  «x7w(t<)i(YrA)«KRKV(3)  I 

7(5)»K3xYC2)*E1x,"i2)xYr9l4»-2xYfnxY(9)»r3xY(/|)>«vrc))*f-4xY(5urSxY(b)  I 
7(ft)»f  l«Y(?)*Yci)Wf?xYr35KY{9)*FixV(a)xYCV1*r«)(Y(i«,)  *f  3hY(A)+ 

Fl»Y(13)xY(8)*Al<V<1')*Y(96)*«'JxY(12}xY(:9^**1«iY(9)*V4xYri3)xYC29)l 
7(7)«Glx,'(*')*(i?*Y(r)*r,  <KVf/)  ! 
7(«)»H1kY(  1  )    j 

7(9>«11kYU0)*T9xYi9)xV(S)»T3*Y(9)xY(MI 

7C10>»J1«Y(lO)*J,>«Yr10)*Y(7)».l3*Y<S)i,YMOWK2xYrH)  +  *.5784,'374» 
7(111  »KlxY(10)*f3xY(7)*K3xYf9)xY(M*K3xYfH  1  ; 

7(121»C1«Y(3)xY(Q)*f^xY(^)xYf  >H>0*Y(31*A1  X   Yf 0)+K4*Y(1  )*K4«Y(8)xY(9)J 
7(131«C1xY(?)xY(0;*C?xY(1)xY(9j»r<xY(i)*  HI  x"n)  +  KaxY(t)*K«xY(8)xY(9) 

♦  <'->xY(l)>«Y(9),K6xY(2)*i<2xV(31xY(S)»A2xY(1ft>xY(3  3)*H,>XY(8)xYn«> 
♦CJxYC?'>)xY(l^)*Jl¥Y(?«)xY(S)*A?xY(??)*AflxY(P)*K1xY(H)xY(?)l 

7(1«>»01xYf3)*n?«YCa)«Y(9)«13xY(0lxY(9)  ♦K7xY(»1xY(«)*K8i«Yf3)  I 
7(l5>aK3»Y(2)*riXYi'3)X>(9)»F9xV(4iXY(V)  +  rixYf4)«Yf9>*e4xY(5)*F$KY(,>)  » 

7( 1 61»F t xy( ?)xYf 9) *F?x> ( ^)XY(9)*F  <xY(4  )«V( 9)*>4xY( ft)  J 
7<17>  =  (il«r(6)*AlwYi,7)xY(«)Kj?xY(r>*f;.4xY(7)  | 
7(1B)«H1xY( 1 )   I 
7(19)»Jl*Y(10)*/»,»xY(M«Y(;>3)*J?*YMO>*Y(7)*jT*Yr,i>MY(10)*  <».0*-0?xY(3> 


*    ?oo 

1 

*      300 

2 

•     100 

j 

*    soo 

4 

»      ftOO 

5 

*     700 

6 

♦      800 

7 

*      900 

8 

*     1000 

9 

*     1100 

10 

*   1200 

1  1 

*     1  V»0 

12 

*     1(100 

13 

*     1*00 

14 

*    1600 

I1) 

*    1700 

16 

*    1  HOO 

17 

*     19  0  0 

18 

*           9000 

19 

*    3100 

30 

*   ??oo 

21 

*    3300 

22 

♦    9100 

23 

*     'SOO 

24 

*    9600 

25 

*    9700 

26 

♦  «..  3«4xY(  ?»i»*(  >n»  9tt*iA*v;is)i  •  ?*oo 

7(?0)iKi«»(iO)»v(M»K?«»(gii<r(M»»ur(ii)  i  i  >voo 

7(?l>"B3«Y<9)»V(1U)*k?«V(?7)*n«Y(10)«l?«'Y(9)»Y(«.)*13i«Y(9)»Tfm  •  3O0O 

7(??3«C?*V(3)>«V(9)»C?«y(l)*»(9)»C1»'V(i)KVC«1»«4«»f  ^»*Tta)»*««»C«3«T(t)l  »  31  00 

7(?3l"C?xY(3)>«Y(0)*f?xy(  3  )« y(«)  »c  (*Y(j)»Y(»)«if««Y-f?1i«Y(?S)»K«»Y(«)*vfV)  •  *?00 

♦  Fl'«Y(ni<Y(?1)*,<!>>«Y(?)»Y(?)*l<h«Yf?)*K;,KV(l)>rY(S)*»1»Y('J)»F«»Y(9)l<Y(1«)  •  3300 

♦  C1  «V  (8)*Y(0  )  +  f>?xY(  ^♦03»Y(i>)*0ft»Y(O)KV(  9i  I  •  *»00 

7<?«1»D?»Y(  3)+n?«Y(t  )xv(  0)«iO»V(«  )»Y(  V)  *K  7*  Y  (  «  >n  v  (  «  )  *X(jx  Y  f  3  )  |  •  1S00 

7(?51»K3xY(?>*F?xY(?1xY(  0  ">  ♦  f  ?»t(  3  1  »  Y  (  V  >  *F3xY(»  IxYf  9)*f  «xY(S  )♦(  SxY(5)  J    •  *AO0 

7(?6>»F?WY(?)XY(  9)»F?xv(  1)»Vf  <»)*f  >xy(  <.  V«vi9)*f  <>»Vf  6)  I  •  3M)0 

7(?73«G?«Y(6)*r,?*Y(7)*r,3«.>(  7)  i  •  3«oo 

7C?8)»H?xY(?)   I  «  3900 

7(?9V«J?xY(  ?  ))♦  ,|?»Yf  ?01xY(  7)«  j3*y(  S)x>(  70)4  *?»Y'  8)*  |1«V(  1  )xy(9)*  <|  ,  0  I     *  4000 

7(l())»I?«yf?n)M'«,'(9!M(SnM>Y(J)iM')t  *  «  1  0  0 

70(1>«  118,0  »  ♦  «?00 

70(2>«   T».o  I  •  «300 

70(31=   3  /• .  0 1  *  till 

70(«V»  1*900.01  ♦  «*>00 

70(5)=   O7.0   »  ♦  «600 

70(6)«   '9.0   ;  *  *TOQ 

70(M«  4*0,0   I  •  •■00 

70(8)»  n,f\       J  •  4900 

70(9)=   1,0   J  ♦  =-000 

70(1")=  •,"  i  •  Mao 

70(1 1 )»   0.0  I  ♦  ''POO 

70(1?)  =  3.S6«^  I  *  *-3G0 

70(H)  «  3«b67.M  ;  «  S«00 

70(1*)  ■  01  *  *">00 

70(1S)  1  II  *  SACO 

70(1*)  ■  .OOOOOOS  :  *  *700 

70(17)  »  3.009H7*C>SSI  •  =.800 
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?0(1«)    ■    ,9M     i 
70(19)    «    «.675J 

70(20)    .    01 

70(21  )    a    J45.67«S(i-1  ; 

70(2?)    «    335634S.0    J 

70(2-0    ■    3.0> 

70(20)    =    oj 

70(2*)    ■    01 

70(26)    x   o; 

70(27)    x    us 

70(28)    =    01 

70(2")    s    01 

70(30)=    o; 

«U   -l  ,nt 

*2=    -2.01 
A3"       .051 
44«    0, 1     > 
A5»    0,5ft    j 
A6«    T.  141  5    J 

"1      »      1 .0! 

R2«    -o.os; 
B3»    -1 .SI 

"«»   -o.otj 
R5»    -o.o*/ 
CI"   -o . re; 
C2x    -O.O-5    i 
C3x    -3,0       I 
nu      3.0        ; 
02«    -O.OOb    j 
D3«    -O.Oob    | 
Fix       0.01       I 


*    5900 

5rt 

*    6000 

59 

*    MOO 

60 

*    A?00 

61 

♦    *300 

62 

*    *  '(  0  0 

63 

♦    *500 

ftk 

*    6600 

65 

*    6700 

66 

*    6800 

67 

*    1*900 

6H 

*    7  0  00 

6V 

*    7100 

70 

*    7200 

71 

*    7300 

17 

*    7100 

73 

*    7500 

T* 

*    7*n0 

75 

*    7700 

76 

*    7800 

77 

*    '900 

76 

*    ROOO 

79 

*    PI  00 

80 

*     B  ?  0  0 

SI 

*   "loo 

82 

*    MOO 

83 

*   "boo 

84 

*    "600 

85 

*    8700 

86 

*    8800 

87 

*    «900 

86 

ko 


rim     o.o'     ; 
Tim      0.005    J 

r«»  -0.5*     t 

P5»  -0.033  I 

Pta  0.03   l 

r2m  0.09   f 

rim  0.00b  t 

P4«  -2,0    J 

fit"  2.0   I 

r,2»-o.oi     i 
r,3«-n.on   i 

HI-  7.0  I 

H2«  S,0«*V7H?/Sf 
T2«-O,01OU0  I 
T3»-O,00'>33  I 
J1--0.50      I 

J2»-o.00O5«  j 
J3»-O,00'50  | 
Kl«  0.00058  | 
H2a  0, 00933  I 
K3--1.0  I 
K4«-3.5«697364l 
K5a  -54S6,«85'i6» 
K6«  -345.56Pre-0'l> 
KP«  4567'. 999  I 
K8«  «5, 5*6*59  I 
:<9o  /(566.4«0?  I 
TO  ■  0  I 
TP  »1l 

TP  »o.o»o,?5»o.s»u.rs» 1 .0  I 

STEP  ■   0,05  J 


•   »0ftft 

•ft 

•    910ft 

*• 

•   •  •on 

♦t 

•    9100 

92 

♦    9460 

93 

♦    0SI>9 

•  4 

•     4400 

99 

•    9700 

ft* 

♦    9400 

•  P 

•    «9ft0 

9a 

♦   10000 

♦ft 

♦   10100 

10ft 

•   io?oo 

101 

♦   10300 

102 

*   10400 

101 

»   10500 

104 

♦    1O600 

109 

•   10  POO 

10ft 

•   loftOO 

10P 

♦   10900 

108 

•   11000 

109 

•   11106 

110 

♦   11700 

111 

♦   1 1 300 

lit 

•   11400 

111 

*  moo 

116 

*   1M00 

119 

•   11  POO 

11ft 

♦  imoo 

IIP 

•   11900 

HI 

♦   1?000 

lift 

SPrEn-UP   factor    is   b^.r 


CtlNGBATlJI  ATIHNS,        Nil    SnuT.f     PHHOWAM    Fhwnws     ,iFHF    DFTFCIFD     IN    PASS     1 
THF     TnT,»|.     IIMF     i'«<;  1     MINUTES     1  "5 .  *     SFrriNOS, 


TIME  F  "■*  1M11MI  7AT  I  "in: 

TIME  F1^  SCANNlMi".! 

TIME  Fin  PAN«I«Gi 

TIME  Fn*  SE»'*NTir     ACTI'lNSt 

TIME  fH  FWRIIR    RFCH»tKY; 

T  IMF  F  IK  OF  H1IG    P»  I  "IT  T  N  r*. : 


0    MlNMI  F  <;  13.?  SECONDS, 

0    MT>llT(  *  11.")  SECONDS. 

0    MlMlTfS  ?7.6  SECONDS. 

0    MlMiTES  7C.9  SECONDS, 

D  mtnhtfc  o.o  seconds. 

0  M  I  M I T  E  S  0.0  SECONDS. 


kl 


ENn 

01?34")*7B9*l»?f>>*AflCnFFr,Ht.  |  »  (<<->«  JKLMNOH  OR  **-!!<  /sruvwxv/ix 
1111111  1111111 1  Mill! 11 1 1  111  111  11  111  J  1111111111111  1  111100000 

1  11  111  111  11  11  1  1111  1  11  11  1  I  1  IH  II  II  1  )  1111  '.  1  11  1  11  11  11  11111  10000 

hi  uii  mi  in  ui  1 1 11  ii  1 1 1 1 1 1 1 1 1 1 1 1 1 1  n  ii  ii  ii  i  ii  inn  1 1  mono 

PERCFNT  *F  AVAILABLE  STn»AC,F  USED  IS  91. IS"1: 


l?tOO 


l?0 


120     rA^,)S     HTAD     AT  00     CARliS     'MR    'U^nTr. 


k2 


APPENDIX  B 

ILLIAC  IV  ASSEMBLY  LANGUAGE  OUTPUT 
PRODUCED  BY  THE  INTEGRATOR  FOR  THIS  SAMPLE  PROBLEM 


h3 


r,,lr)f"S  /MI'iASK 


I   ISTFO    -3Y    SNAP 


IN 


APHtL     30.     l9'-9       AT       20lJ9!44.tt 


*    THF 

* 

«  foiia 
t  t  ro 
I 

t 

HNFl 
HALF! 

Twm 

T  All  l 

.cnijMT 

.  INC  ! 

•TFMP? 
.TFMP3 
.LFFTt 
,N63« 

I 

ARIT^I 
F  X  P  S  «  t 
KSi 


SUM1  ! 

1UM2I 
SUM3I 

TNOEVl 


XKi 

XKTE»t 

''RACt 

INTEl 


TSnMi 
TNOX1 t 
FNRl 
PONSTl 


RFGlN 
FDILDwrNT    PK'lfjKAM    WAS    OPERATED    HY    TMF    GFMXOAL    Ul  FFk RFNT I *L 

TIo\    SOLVFP    I'Pnr.'jJM,        n     inIEOwaTFS    THF     TnpiiT    FuIIAUONS    USING 
URTM     UROFR     -ttlN'OF     K'lTTA      ■"FTMin. 


Pu„S 

F^u 
F*J 
F^O 

FOg 

F1J 
iFOj 

1F«U 
Fog 

ir^j 

IFIJ 
FOj 
FOj 

FTLI 
Rl  * 

FTLI 

Hi  * 

FTLI 
Rl  K 
FILL 
Rl  K 
Rl  * 

Rl  * 

Rl  < 
HI  K 


1  if 


FTLI 

nAiA 
mi  a 
niiA 
niTA 

«l  K 
PI  < 
Rl  * 
Rl  K 
FTLL 

HI  K 
Bl  K 
FOu 
F*U 


DATA 


r'iOO'.) 
■  4000 

=  '400  '.1 

=  'inon 

=  1 0  0  f) 

H«Sl 

j,  .  4  •• : 
i  >u? : 

bl  1 H  I 
i  » 4  V l 
»OS0  t 
S  )S?i 


r  i   'ir-uiw  c   u.ua.  . , 

i  u  0  o  o  o  o  o  o  o  o  o  o  o  0  (i  0  |  H  |  n 
0l?59S2r;>?5;">2525?lM;i! 
>:  4  n c  n o o i)  .t  o n  o 0 0 o u o  i  fl  :  k 
?400oooooooo00uuo  :  *  i  1 
1600n0OO3')OOOO0()O|R>^ 

*  i  r f  r   r  i  vi! 


N 1 1  H  H  F  R 


FLOATING    PHINt     HNF 
FLOATING    ONE     /SIXIH 
FLOATING    iINF    HALF 
Fl  (1MINP,     Ml] 
J  I  MF     I1FI   AYf I .5) 

RnnTTNG   rnuNu 

1  I  f '  o  »  T  I  n  N  s 

>/AL'if«    T n    he    STnwri) 


i  ri  i    mirr  l t *« r t 

M  A  S  *. 
MF'-1"Hr     li  \  T  A , 


1  'H  : 

2  0: 

I  *«> 
1  II 

1  >  M  !  * 

»i  5  :  * 
1  ">i  * 
•  >o;  f 
I  -1 

1  :? 

1  i* 

1  '<% 

1  ->  fl  ;  * 

j»  1  •' 

21»22 

jfi.  "> 

■      5  3 .  S 

1  >* 

1  :* 

1  :» 

1  :» 
i  <  4  ;  'i 

1  o :  * 

1)1* 

s  1   f  f  •  i 

=  1  777 


H  IJfM  Cj  ="     KIJTTA     PO'j^TiNTS 


SPRATPH     STUtrAP.F     f  CR    Fll'JMfNP,     S'lMS 

777 r/ rr ,'7/TTr n rrT tt\;r     fnarlf  hits 


i.l41S,n,i,-i.s.-O,O,l,-0,02»-1.0»-1.0»-34S,5677a-04» 
-). n OS. 41. S6nhS9» 0.02.-0. 033. 0.0  05 .0.03.0.00058, 
•'i.I)1,,"C.50»;).00?.j3»»1  .  0.-0.09  »-1.0» -0.0?*  "1  «5» 
-s  4S6.u«5»4»  -?.()» 3.  o»o.ooos8»-n,no5»-i,o»o,oo5>o,03» 

•:?,  0.-0,01  .-0.50.-0.  00?*>ft»  3.14|  ft»-1  .  0*  #-0,0?  » 

-f. 5469736 'I. -1. 03.-3. 5fl697364.-J.AS, 5677  »-04»-0. 56, 
-0,005.-0. 005.45. 566059, 0.02.-0. 033. 0.005, -0,01. 

-'i.noo^i»,-->.n. -o.O)  ooo.o.o  .O.n.o.o  i 


Oo 

oni  00 

00 

on?00 

00 

on  300 

no 

00400 

no 

3PS00 

no 

onAOO 

no 

■)o/00 

no 

onfloo 

00 

on  900 

00 

01  000 

no 

OH  00 

no 

01  200 

no 

01  300 

no 

0  1<4  0  0 

no 

01  SCO 

no 

0  1  is  o  o 

no 

0'  700 

no 

01  100 

no 

01  900 

no 

O'OOO 

00 

0?1  00 

00 

0??00 

no 

09300 

00 

0?400 

oq 

O'SOO 

no 

0?600 

ii; 

0?700 

no 

02800 

no 

02900 

00 

0*000 

no 

03100 

^0 

Oi?00 

00 

0>  iOO 

no 

O^'IOO 

00 

03500 

00 

O'AOO 

00 

01700 

no 

oifloo 

00 

0^900 

00 

0*000 

00 

04100 

no 

0«?00 

no 

0  0  3  0  0 

no 

O'lAOO 

no 

oasoo 

no 

0*600 

no 

0«700 

00 

0««00 

00 

04900 

oo 

o'-ooo 

no 

0"i100 

00 

0">?00 

00 

0S30O 

00 

0^400 

00 

o^soo 

00 

05600 

no 

OS700 

1 

2 

s 

4 

5 

A 

7 

H 

9 

10 

1  1 

12 

1.1 

14 

16 

\r 

10 
19 
20 
21 
22 
23 
24 
25 
26 
27 
20 
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APPENDIX  C 
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